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Abstract — We investigate the problem of Multiple Descrip- 
tion (MD) coding of discrete ergodic processes. We introduce 
the notion of MD stationary coding, and characterize its 
relationship to the conventional block MD coding. In stationary 
coding, in addition to the two rate constraints normally consid- 
ered in the MD problem, we consider another rate constraint 
which reflects the conditional entropy of the process generated 
by the third decoder given the reconstructions of the two 
other decoders. The relationship that we establish between 
stationary and block MD coding enables us to devise a universal 
algorithm for MD coding of discrete ergodic sources, based on 
simulated annealing ideas that were recently proven useful for 
the standard rate distortion problem. 

I. INTRODUCTION 

Consider a packet network where a signal is to be de- 
scribed to several receivers. In a basic setup, the source 
is coded by a lossy encoder, and several copies of the 
packet containing the source description is sent over the 
network to make sure that each receiver gets at least one 
copy. Receiving more than one copy of these packets is 
not advantageous, because all the packets contain similar 
information. In contrast to this setup, one can think of a 
more reasonable scenario where the packets flooded into the 
network are not exactly the same; They are designed such 
that receiving each one of them is sufficient for recovering 
the source, but receiving more packets improves the quality 
of the reconstructed signal. The described scenario is referred 
to as multiple description. 

The information-theoretic statement of the MD problem, 
and early results on the MD problem can be found in [1]- 
[4] . Even for the seemingly simple case where there are only 
two receivers, and the source is i.i.d., the characterization of 
the achievable rate-distortion region is not known in general. 
For this case, there are two well-known inner bounds due 
to El Gamal-Cover [5] and Zhang-Berger [6], There is also 
a combined region, introduced in [7], which includes both 
regions, but recently shown to be no better than the Zhang- 
Berger region [8]. In any case, full characterization of the 
achievable region is not yet known. 

Since even for i.i.d. sources, the single-letter characteri- 
zation of the achievable rate-distortion region is not known 
in general, there are few works done on the MD of non- 
i.i.d. sources. The rate-distortion region of Gaussian pro- 
cesses is derived in [10], and is shown to be achievable 
using a scheme based on transform lattice quantization. In 
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[9], a multi-letter characterization of the achievable weighted 
rate-distortion region of discrete stationary ergodic sources 
is derived. 

In this paper, we consider the MD of discrete ergodic 
processes where the distribution of the source is not known 
to the encoder and decoder. We introduce a universal al- 
gorithm which can asymptotically achieve any point in the 
achievable rate-distortion region. In order to get this result, 
we start by defining two notions of MD coding, namely, 
(i) conventional block coding, and (ii) stationary coding. 
In the normal block-coding MD, there are two rates but 
three reconstruction processes. In the stationary coding setup, 
there are three rates and three reconstruction processes. The 
additional rate corresponds to the conditional entropy rate 
of the the ergodic process reconstructed by the privileged 
decoder, which receives two descriptions of the source, given 
the two other ergodic reconstruction processes. We show that 
these two setups are closely related and, in fact, characterize 
each other. The beneficial point of the new definition is 
that it enables us to devise a universal MD algorithm. The 
introduced algorithm takes advantage of simulated annealing 
which was used recently in [15] to design an asymptotically 
optimal universal algorithm for lossy compression of discrete 
ergodic sources. 

The outline of this paper is as follows: In Section II some 
preliminary notation, and definitions are presented. Section 
III studies a simple example, which, as made clear later, 
is closely related to the MD problem. Section IV formally 
defines the MD problems, and the two notions of block MD 
coding and stationary MD coding, and shows the relationship 
between the two. Based on these results, a universal MD 
algorithm is described in Section V, and in Section VI some 
simulation results demonstrating the performance of the 
proposed algorithm on simulated data are presented. Finally, 
Section VII discusses some future research directions. 



II. NOTATION 

€ 1N + } be a stochastic process 



Let X = {A^;V i 
defined on a probability space (X, E,/x), where p is a 
probability measure defined on S, the a-algebra generated 
by the cylinder sets C. For a process X, let X denote the 
alphabet set of Xi, which is assumed to be finite in this 
paper. The shift operator T : X°° — > X°° is defined by 

(Tx) n =i n+ i, x G X°°,n > 1. 

Moreover, for a stationary process X, let H(X.) denote its 
entropy rate defined as H(X) = lim H(X n+ i\X n ). 

n — >oo 

Let X and X denote the source and reconstruction alpha- 
bets respectively. For y n £ y n , define the matrix m(y n ) to 



be the \y\ x \y\ k matrix representing the (k + l) th order 
empirical distribution of y n , i.e., its (f3, b) th element is 
defined as 



1 



m , b (y n ) = - \{l < i < n : y^_l = b, Vi = /3]}| 



(1) 



where b G y k , and j3 G y. In (1) and throughout we assume 
a cyclic convention whereby yi — y n +i for i < 0. Let 
Hk{y n ) denote the conditional empirical entropy of order 
k induced by y n , i.e. 



H k (y n )=H(Y k+1 \Y k ), 



(2) 



where Y k+1 on the right hand side of (2) is distributed 
according to 

P(y fe+1 = [b,/3]) = m, a , b (y"). (3) 

The conditional empirical entropy in (2) can be expressed as 
a function of m(y n ) as follows 



H k (y n ) = - ^W(m, b (y n ))l T m, b (y"), 



(4) 



where 1 and m. ! \ ) (y n ) denote the all-ones column vector 
of length \y\, and the column in m(y n ) corresponding to 
b respectively. For a vector v = (vi, . . . , vg) T with non- 
negative components, we let 7i(v) denote the entropy of the 
random variable whose probability mass function (pmf) is 
proportional to v. Formally, 

W(v) = (£L^Tlog^ ^ ((,,... ; Or 







if v=(0,...,0) 



(5) 

Let m.{w n \y n 1 z n ) denote the conditional k th order 
empirical distribution of w n given y n and z n , whose 
(/?, bp, bi, b2) th element is defined as 



m /3,b ,bi,b 2 — 



bi,^=b 2 } 



(6) 

where (3 G W, b G W k , bi e y 2 *^ 1 , and b 2 e Z 2fcl+1 . 
Now define the conditional empirical entropy of w n given 

y n and z n , Hk,k^ (y n \w n , z n ), in terms of m(w n \y n , z n ) as 

H kM (w n \y n ,z n ) = lTin ;b MM n ( m ;b MM)- 

(7) 



bo.b1.b2 



Ri bits 
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R 2 bits 



Fig. 1. Example setup 



III. SIMPLE EXAMPLE 

Before formally defining the MD problem, consider the 
setup shown in Fig. 1. This example is meant to provide 
some insight into the MD problem. Also, the results of 
this section will be used in the proof of Theorem 2 in 
Appendix A. Here Si G Si, S2 £ S2 and So G So 
denote three correlated discrete-valued random variables, and 
(Si,S2,So) ~ P(si,s 2 ,s ). The Encoder's goal is to send 
i?i bits to Decoder 1, and R2 bits to Decoder 2 such 
that Decoder 1 and 2 are able to reconstruct Si and S2 
respectively. Moreover, the transmitted bits are required to 
be such that receiving both of them enables Decoder to 
reconstruct So- In all three cases, the probability of error 
is assumed to be zero. Let Mi £ {1, . . . , 2 Rl }, and M 2 G 
{1, . . . , 2 R2 } denote the messages sent to the decoders 1 and 
2 respectively. The question is to find the set of achievable 
rates (R\, i?2)- The following theorem states some necessary 
conditions for (i?i,i?2) to be achievable. It is very similar 
to Theorem 2 of [5], and the two theorems are in fact easily 
seen to prove each other. The version we give here is most 
suited for our later needs. 

Theorem 1: For any achievable rate (i?i , R2) for the setup 
shown in Fig. 1, 

Ri >H(Si) 
R2 >H{S 2 ) 

Ri + R2 >H(Si) + H{S 2 ) + H(So\Sx,S 2 ). (8) 
Proof: Rx > ff(Mi) and R 2 > H(M 2 ) follow from 
Shannon's lossless coding Theorem. It is also clear that we 
should have 



Ri+ R 2 > H(S\,S 2 , So) 

= J ff(Si,S 2 ) + J?(S |Si,S 2 ). 



(9) 



But, perhaps somewhat counterintuitively, (9) is just an outer 
bound, and is not enough. Ri + R 2 in fact satisfies the tighter 
condition stated in (8), as can be seen via the following chain 
of inequalities: 

Ri+R 2 > H(M 1 ) + H(M 2 ), 

= H(M 1 ,Si) + H(M 2 ,S 2 ), 

= H(Si)+H(M 1 \Si)+H{S 2 )+H{M 2 \Sa), 

> H(Si) + H(S 2 ) + H(M 1 \S 1 ,S 2 )+ 
H(M 2 \S!,S 2 ), 

> ff(Si) + H(S 2 ) + H{M X ,M 2 \S X ,S 2 ), 

> ff(Si) + H(S 2 ) + H{M U M 2 , S |Si, S 2 ), 
>i?(Si)+ J ff(S 2 ) + i?(S |Si,S 2 ). (10) 



IV. MULTIPLE DESCRIPTION PROBLEM 

Consider the basic setup of MD problem shown in Fig. 2. 
In this figure, X n is generated by a stationary ergodic source 
X. 

Remark: In order to see the connection between the 
example described in Section III, and the MD problem, note 
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Fig. 2. MD coding setup 



that letting $ = X?, i G {1,2}, and S a = X$, the 
MD problem can be described as the problem of describing 
(Si,S 2 ,Sq) to the respected receivers error-free. In other 
words, for each code design, we have a problem equivalent 
to the one described in Section III. 

A. Block coding: 

MD coding problem can be described in terms of encoding 
mapping /, and decoding mappings (gi,g 2 ,go) as follows 
X n ^ . 2 «fli] x [i . 2 nR *], 
[1 : 2 nRi ] -> X n , for i= 1,2, 
[1 : 2 nRl ] x [1 : 2 nR2 } -> X n , 

4) (M 1 ,M 2 ) = f(X n ), 

5) X" = gj(Mj), for i = l,2, 

6) l "= 5o (M 1 ,M 2 ). 

(iii, J?2, D\, D 2 , Dq) is said to be achievable for this 
setup, if there exists a sequence of codes 



1) / 

2) 54 

3) go 



(f M ,g[ n \gi n) A n) ) such that 

limsupEd n (X",Xf) < A, for i 

n 

limsupEd„(X n ,l ™) < L» . 



1,2, 



Let & B be the set of all (Ri, R 2 , -Di, D 2 , Dq) that are 
achievable by block MD coding of source X. 

B. Stationary coding: 

Define (Rn,R 22 ,Ro,Di,D 2 ,Do) to be achievable by 
stationary coding of source X, if for any e > 0, 
there exist processes X^ , X 2 an d X^ such that 
(X, X^ , X 2 e ', Xq ) are jointly stationary ergodic pro- 
cesses, and 



H(X.[ e) ) < R n 


+ e 


(ID 


H(± { 2 ] )<R 22 


+ e 


(12) 


i?(xW|x( e) ,X 2 e) )+<^o- 


-e 


(13) 


Ed(X Q ,x[%<D 1 - 


1- e 


(14) 


Ed(X ,X^ )<D 2 - 


he 


(15) 


Ed(X ,X : % <D - 


he. 


(16) 


Let Sf v denote the set of all (R\\,R 22 




D 2 ,D ) 



that are achievable by stationary MD coding of source X. 
The following theorem characterizes & B in terms of M v . 



Theorem 2: Let X be a stationary ergodic source. 
For any (R x , R 2 , D±, D 2 , D ) G & B , there exists 
(R n , R 22 , R , D u D 2 , D ) G M p such that 



Ru 



R\x < i?i 

i?22 < ^2 
i?22 + Rq < Rl 



Ri 



(17) 
(18) 
(19) 



On the other hand, if (Rn, R 22 , Ro, D x , D 2 , D ) G ^ p , 
any point (R\, R 2 , D\, D 2 , Do) satisfying ( 17)-( 19) belongs 
to 

Proof: Refer to Appendix A for an outline of the proof. 

■ 

Remark: The theorem implies that M B can be character- 
ized as the set of (R%, R 2 , D\, D 2 , Dq) such that 



ff(xo < i? 4 
H(X 2 ) < R 2 
H(± 2 ) + H(±o\±i,± 2 )<R 1 + R 2 , 



for some jointly stationary ergodic 
(X,Xi,X 2 ,X ) which satisfy (14)-(16). 



processes 



V. UNIVERSAL MULTIPLE DESCRIPTION 
CODING 

Equipped with the characterization of the achievable re- 
gion established in the previous section, we now turn to our 
construction of a universal scheme for this problem. Consider 
the following MD algorithm for the setup shown in Fig. 2. 
Let 

(x r { ,X 2 ,X Q ) = 
argmin [jiH k (y n ) + l2 H k (z n ) + l0 H kM (w n \y n ,z n ) 

+a 1 d n (x n ,y n ) + a 2 d n (x n , z n ) + a d n (x n ,w n )} , (20) 

Assume that 7$ > and a* > 0, for i G {0,1,2}, are 
given Lagrangian coefficients. Also, assume that k\ < k — 
o(logn) such that k\ — > 00 as n — > 00. 

Theorem 3: Let X be a stationary ergodic process, and 
(Xf,X 2 ,Xo) denote the output of the above algorithm to 
input sequence X n . Then, 

lim sup 

n 

yiH k (X?) + y 2 H k (X2) + l0 H kM (X$\X?, X' 2 l )+ 
atd n (X n , X?) + a 2 d n (X n , ! 2 ") + a Q d n (X n , X$) 

= min [71-Rn + 72-R22 + 70-R0 + Oi\Dx + a 2 D 2 + a D ] 

(21) 

almost surely, where the minimization is over all 
(R 11 ,R 22 ,Ro,D 1 ,D 2 ,D )eM F . 

The proof of Theorem 3 is presented in Appendix B. 

After finding (x™, x 2 ,£q), x™ and x 2 will be described to 
Decoders 1 and 2 respectively using one of the well-known 
universal lossless compression algorithms, e.g., Lempel Ziv 
algorithm. Then Encoder forms a description of x r l con- 
ditioned on knowing x™ and x 2 using conditional Lempel 



Ziv algorithm or some other universal algorithm for lossless 
coding with side information [11]. A portion < 6 < 1 of 
these bits will be included in the message AI\ and the rest 
in message M 2 . 

For finding an approximate solution of (20) instead of 
doing the required exhaustive search directly, as done in [15], 
one can employ simulated annealing [14]. To do this, we 
assign a cost to each (y n , z n ,w n ) G X n x X n x X n as 
follows 

£{y n ,z n ,w n ):= 

lx H k {y n ) + l2 H k {z n ) + la H kM (w n \y n , z n ) 
+ a 1 d n (x n ,y n ) + a 2 d n (x n ,z n ) + a d n (x n ,w n ), 

and then define the Boltzmann probability distribution at 
temperature T = 1//3 as 



pp{y n , z n , w n ) := I e -^(» n .* n .™ n ), (22) 



where Z is a normalizing constant. Sampling from this 
distribution at a very low temperature yields (X™,X^,Xq) 
with energy close to the minimum possible energy, i.e., 



c ( vn yn \rn- 
<H A 1 i A 2 ) A . 



min £{y n ,z n ,w n ). (23) 

(y n ,z n ,w n ) 



Since sampling from (22) at low temperatures is almost as 
hard as doing the exhaustive search, we turn to simulated 
annealing (SA) which is a known method for solving discrete 
optimization problems. The SA procedure works as follows: 
it first defines Boltzmann distribution over the optimization 
space, and then tries to sample from the defined distribution 
while gradually decreasing the temperature from some high 
T to zero according to a properly chosen annealing schedule. 

Given £(y n , z n ,w n ), similarly as in [15], the number of 
computations required for calculating 

£(y l ~ 1 ay? +1 , z t ~ 1 bz" +1 ,w l ~ 1 cwf +1 ) , when only one of the 
following is true: a 7^ j/,, b ^ z,, or c ^ Wi, for some 
i 6 {1, . . . , n} and a, b, c G X, is linear in k and k\, and is 
independent of n. Therefore, this energy function lends itself 
to a heat bath type algorithm as simply and naturally as the 
one in the original setting of [15] did. 

Now consider Algorithm 1 which is based on the 
Gibbs sampling method for sampling from pp, and let 
(X™ r , XJfr, Xq r ) denote its random outcome for the input 
sequence X n after r iterations 1 , when taking k\ = fei >n 
, k = k n and (3 = {(3t}t to be deterministic sequences 
satisfying k\, n = o(logn), k n = o(logn) such that k, ki — > 
00 as n — > 00, and /3 t = — ^ylog([^;J + 1), for some 



Here and throughout it is implicit that the randomness used in the 
algorithms is independent of the source, and the randomization variables 
used at each drawing are independent of each other. 



T^ n) > nmax(Ai, A 2 , A ), where 

Ai = max \£(y i - 1 ayl ¥1 ,z n , w n ) - E^by^, z n , w n )\ , 
ie{l,...,n} 
y 1 - 1 ex*-\ y ? +1 e X n ~\ 
a, & G X, 

z n G X n ,w n G X n , (24) 



A 2 =m ax \£(y n ,z i - 1 az? +1 ,w n )-£(y n ,z i - 1 bz? +1 ,w n )\ 



ie {!,..., n} 

z i-i £ &-i }Z f +1 e x r 

a,b G X, 

y n G X n ,w n G X n , 



(25) 



n „n ...<-l_...n \ CI..V. z « tW i-lb w f +1 )\. 



A = max \£(y n , z n , w^aw^) - £{y 
ie{l,...,n} 
w 

a,b G X, 

y n G X n ,z n G X n , 



G X , ~ 1 ,w 7 j l +1 G X r 



(26) 



As discussed before, the computational complexity of the 
algorithm at each iteration is independent of n and linear in 
k and k%. Following exactly the same steps as in the proof 
of Theorem 2 in [15], we can prove the following theorem 
which established universal optimality of Algorithm 1. 
Theorem 4: For any ergodic process X, 

lim lim £(X?,XZ t X2) 

n — >oo r — >oo 

= min [71-Rn + 72-R22 + 70-R0 + a\Di + a 2 D 2 + a D ] 

(27) 

almost surely, where the minimization is over all 

(Rn,R 22 , Ro, Dx,D 2 ,D ) G .^ p (X). 

Algorithm 1 Generating the reconstruction sequences 

Require: x n , k u k, {ai}| =0 , {A}?=o {Pt}t=i> r 
Ensure: a reconstruction sequences (x™, i?? , £q ) 



1: 


y n <- 


x n 


2: 


z n <- 


x n 


3: 




-x n 


4: 


for t 


= 1 to r do 



Draw an integer i G {1, . . . , n} uniformly at random 

For each y G X compute q\{y) = pp t (Yi = y\Y n \ % = 

y n \\Z n = z n 1 W n = w n ) 

Update y n by letting yi = V\, where V\ ~ q\ 

For each z G X compute q 2 (z) = pp t {Zi ~ z\Y n = 

Update z n by letting z; = V 2 , where V 2 ~ q 2 
For each y G X compute pp t (Yi = y\Y n ^ = y n \ l ) 
Update w n by letting Wi = Vq, where Vq ~ go 
Update m(y n ), m(z n ) and m(w n \y n , z n ) 

end for 

x n <- y n 



VI. SIMULATION RESULTS 



In this section, we present some results showing the actual 
implementation of the algorithm described in Section V. The 
simulated source here is a sym metric binary Markov source 
with transition probability p = 0.2. The considered block 
length is n = 10 4 , and the context sizes are k = 5 and 
ki = 1. The annealing schedule was chosen according to 



where t is the iteration number. The number of iterations, r, 
is equal to 50n. The algorithm with the specified parameters, 
for 71 = 72 = 70 = oi\ = ct 2 = a>o = 1, achieves the 
following set of rates and distortions: 





= 0.5503, 




= 0.5586, 


Hk,k± (%0 \ %i j X 2 ) 


= 0.0038, 


d n [x , Xi ) 


= 0.0505, 


d n (x , x 2 ) 


= 0.0483, 


d n (x , Xq ) 


= 0.0036. 



Fig. 3 shows how the total cost is reducing in this case, as the 
number of iterations increases. One interesting thing to note 
here is that although the sequences x™ and x 2 have almost 
the same distance from the original sequence x n , they are far 
from being equal. In fact, d n (%i, x 2 ) = 0.0966, which, given 
d n (x n ,xl) = 0.0505 and d n (x n ,x%) = 0.0483, suggests 
that they are almost maximally distant. 

As another example, consider the case where n — 5 x 
10 4 and ati = 012 = 2. The rest of the parameters are left 
unchanged. The achieved point in this case is going to be 





= 0.6091, 




= 0.5951, 


ff M i(*ol*i,*a) 


= 0, 


d n (x ,Xi) 


= 0.0200, 


d n {x , x 2 ) 


= 0.0240, 


d n [X , Xq ) 


= 0.0010. 



Here, £ffc,ki (xq \x", x 2 ) = implies that &o,i is a determin- 
istic function of its context, (x l ~ i 1 _ ki , x 1 ^}^ , x 2 + i k } ki ). This 
of course does not mean that no additional rate is required 
for describing Xq when the decoder already knows x" and 
x 2 , because this deterministic mapping itself is not known 
to the decoder beforehand. Here again x± and x n are almost 
maximally distant because d n {xi,x' 2 r ) = 0.0436. 

Note that the fundamental performance limits are unknown 
even for memoryless sources and, a fortiori, for the Markov 
source in our experiment. Thus the performance of our 
algorithm cannot be compared to the corresponding optimum 
performance. The results of the preceding section, however, 
imply that our algorithm attains that performance in the 
limit of many iterations and large block length. Thus, the 
performance attained by our algorithm, can alternatively be 
viewed as approximating the unknown optimum. 




n 1 1 1 1 1 1 1 1 1 1 1 

s5 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

((tt»#oflte»1Som) ,„! 



Fig. 3. Reduction in the cost. At the end 

of the process, the final achived point is: 

dn(x n ,Xo)) = (0.5503,0.5586,0.0038,0.0505,0.0483,0.0036) 



VII. FUTURE DIRECTIONS 

Simulated annealing was recently employed in [15] to 
design a universal lossy compression algorithm. In this paper, 
we proved that in fact the same tool can be applied to 
devise a universal MD algorithm. We started by defining 
the equivalent of MD problem for ergodic processes, and 
defined the idea of stationary MD coding which includes 
three rate constraints instead of two. Extensions of these 
results to additional distributed coding scenarios are under 
current investigation. 
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Appendix A: Outline of the proof of Theorem 2 

Outline of the proof of the first part: Let 
(R 1 ,R 2 ,D 1 ,D 2 ,D ) G & B . We need to find (R n ,R 2 2,R ) 
such that (R n ,R 2 2,Ro,Di,D 2 ,Do) £^ p , and (17) -(19) 
are satisfied. 

Let <?2™ ■'iffo™'') t> e a sequence of codes at rate 

(Ri,R 2 ) that achieves the point R 2 , D\, D 2 , D ) G 
^ B . Note that for a given code, (Xf, Xg, ) is a deter- 
ministic function of X n . Using the same method used in 
[12], we can generate jointly stationary ergodic processes 
(X^ , X 2 ™' 1 , Xq"" 1 ) by appropriately embedding these block 
codes into ergodic processes. Here the superscript (n) indi- 
cates the dependence of the constructed processes on n. In 
order to code an ergodic process into another ergodic process 
using a block code of length n, we need to cover an infinite 
length sequence by non-overlapping blocks of length n up to 
a set of negligible measure, and then replace each block by its 
reconstruction generated by the block code. The challenging 



part is the partitioning which should preserve the ergodicity. 
This can be done using R-K Theorem [13] which states that: 
Theorem 5 (Rohlin-Kakutani Theorem): Given the 
ergodic source X, integers N, n < N, and e > 0, there 
exists an event F (called the base) such that 

1) F, TF, T N ~ 1 F are disjoint, 

2) P ftjTiF^ > 1 - e, 

3) P (S(a")|F) = P {S(a n )), where S(a n ) = {x : x n = 
a"}. 

Since the sequence of MD block codes were as- 
sumed to achieve the point (Ri, R2, D\, D2, Do), the con- 
structed process (X^X^X^) satisfies the distor- 
tion constraints given in (14)-(16) at (Di + e n ,D2 + 
e n ,Do + e„), where e n — » as n — > 00. There- 



fore, (ff„(Xi" ) ),H n (Xr),^n(XriXr,Xr),Ci + 



y(") 



(n)i-a-(n) 



•e n) Do + en) £^ P - Let 



where 
1, 



= gi n) (M l ,M 2 ) 
knows (Xf, X£, XJ), by Theorem 1, R { ™> 

R22 — -^2> + -^22^ -^0 ^ 





n 


(A-l) 


p(») 
•"-22 


n 


(A-2) 




:=-iT(X "pq\X 2 "), 
n 


(A-3) 




ffl (n) (M 4 ), for i G 


{1,2} and 



Note that since the encoder 

(n) < R lt 

< R 2 , and RXi 1 + R 2 7 + R "' < Ri + 
R.2- The only remaining step is to find the relationship 
between (£n(X< n) ), H n (X { ^), ff n (X^ |X< n) , X 2 ' l) )) and 
-R 2 2^j -^0™ )> which is not hard from the way the 
processes are constructed. 

Outline of the proof of the second part: Let 
(R 11 ,R22,Ro,D 1 ,D 2 ,D ) G M ¥ . This means that 
there exist processes Xi, X2 and Xo jointly stationary and 
ergodic with X which satisfy (11)-(16). Based on these 
processes, for block length n, we use the following block 
coding strategy: For coding sequence X n , describe X™ and 
X2 losslessly to the decoders 1 and 2 using n(-ff(Xi) + e„) 
and n(H(X.2) + e«) bits respectively. Given X™ and 
X™, n(ff(X |Xi,X 2 ) + e„) bits suffice to describe X™ 
losslessly to Decoder 0. These bits can be divided into two 
parts: the first part will be included in the message Mi, and 
the rest in the message M 2 . Decoders 1 and 2 just ignore 
these extra bits, but Decoder combines them with the two 
other messages to reconstruct Xq. Since Ri and R2 satisfy 
(17)-(19), it is possible to do this. 

Appendix B: Proof of Theorem 3 
For an ergodic source X, let 

n(rr,a) := 

min [71-Rn + 7 2 -R 22 + 7oRo + Qi-Di + a 2 -D2 + a D } ■ 

^ P (X) 

(B-l) 



lower bounded by its right hand side. Therefore, we only 
need to prove the other direction. By definition, for any 

(i?n, i?22, Rq, Di, D2, Dq) G .^ P (X), there exist processes 
Xi, X 2 and Xo such that (11)-(16) are satisfied. On the 
other hand, if (X{\ X£, Xq ) is generated by jointly ergodic 
processes (Xi,X 2 ,Xo), then for k = o(logn) and k\ = 
o(logrt) such that k,k\ — » cxd as n — > 00, Hk{Xf) — > 
H{%), for i G {1, 2}, and moreover H kM (X^X?, X$) -> 
H(Xo|Xi,X 2 ). This implies that 

limsupmin^H^X") + a 2 d n (X n , X%)+ 
7 2fffc(X 2 l ) + a 1 d n (X I \Xr)+ 

loH kM (X l |Xr, X 2 ") + a d n (X n , X n )] 

(B-2) 

is upper-bounded by ^(7, a)+e„, where e n — > 0. Combining 
these two results in the desired conclusion. 
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No coding strategy can beat p(j,a) on a set of non- 
zero probability. Therefore, the left hand side of (21) is 



